1,892 research outputs found

    Models and Algorithms for Sorting Permutations with Tandem Duplication and Random Loss

    Get PDF
    A central topic of evolutionary biology is the inference of phylogeny, i. e., the evolutionary history of species. A powerful tool for the inference of such phylogenetic relationships is the arrangement of the genes in mitochondrial genomes. The rationale is that these gene arrangements are subject to different types of mutations in the course of evolution. Hence, a high similarity in the gene arrangement between two species indicates a close evolutionary relation. Metazoan mitochondrial gene arrangements are particularly well suited for such phylogenetic studies as they are available for a wide range of species, their gene content is almost invariant, and usually free of duplicates. With these properties gene arrangements of mitochondrial genomes are modeled by permutations in which each element represents a gene, i. e., a specific genetic sequence. The mutations that shape the gene arrangement of genomes are then represented by operations that rearrange elements in permutations, so-called genome rearrangements, and thereby bridge the gap between evolutionary biology and optimization. Many problems of phylogeny inference can be formulated as challenging combinatorial optimization problems which makes this research area especially interesting for computer scientists. The most prominent examples of such optimization problems are the sorting problem and the distance problem. While the sorting problem requires a minimum length sequence of rearrangements that transforms one given permutation into another given permutation, i. e., it aims for a hypothetical scenario of gene order evolution, the distance problem intends to determine only the length of such a sequence. This minimum length is called distance and used as a (dis)similarity measure quantifying the evolutionary relatedness. Most evolutionary changes occurring in gene arrangements of mitochondrial genomes can be explained by the tandem duplication random loss (TDRL) genome rearrangement model. A TDRL consists of a duplication of a consecutive set of genes in tandem followed by a random loss of one copy of each duplicated gene. In spite of the importance of the TDRL genome rearrangement in mitochondrial evolution, its combinatorial properties have rarely been studied. In addition, models of genome rearrangements which include all types of rearrangement that are relevant for mitochondrial genomes, i. e., inversions, transpositions, inverse transpositions, and TDRLs, while admitting computational tractability are rare. Nevertheless, especially for metazoan gene arrangements the TDRL rearrangement should be considered for the reconstruction of phylogeny. Realizing that a better understanding of the TDRL model is indispensable for the study of mitochondrial gene arrangements, the central theme of this thesis is to broaden the horizon of TDRL genome rearrangements with respect to mitochondrial genome evolution. For this purpose, this thesis provides combinatorial properties of the TDRL model and its variants as well as efficient methods for a plausible reconstruction of rearrangement scenarios between gene arrangements. The methods that are proposed consider all types of genome rearrangements that predominately occur during mitochondrial evolution. More precisely, the main points contained in this thesis are as follows: The distance problem and the sorting problem for the TDRL model are further examined in respect to circular permutations, a formal concept that reflects the circular structure of mitochondrial genomes. As a result, a closed formula for the distance is provided. Recently, evidence for a variant of the TDRL rearrangement model in which the duplicated set of genes is additionally inverted have been found. Initiating the algorithmic study of this new rearrangement model on a certain type of permutations, a closed formula solving the distance problem is proposed as well as a quasilinear time algorithm that solves the corresponding sorting problem. The assumption that only one type of genome rearrangement has occurred during the evolution of certain gene arrangements is most likely unrealistic, e. g., at least three types of rearrangements on top of the TDRL rearrangement have to be considered for the evolution metazoan mitochondrial genomes. Therefore, three different biologically motivated constraints are taken into account in this thesis in order to produce plausible evolutionary rearrangement scenarios. The first constraint is extending the considered set of genome rearrangements to the model that covers all four common types of mitochondrial genome rearrangements. For this 4-type model a sharp lower bound and several close additive upper bounds on the distance are developed. As a byproduct, a polynomial-time approximation algorithm for the corresponding sorting problem is provided that guarantees the computation of pairwise rearrangement scenarios that deviate from a minimum length scenario by at most two rearrangement operations. The second biologically motivated constraint is the relative frequency of the different types of rearrangements occurring during the evolution. The frequency is modeled by employing a weighting scheme on the 4-type model in which every rearrangement is weighted with respect to its type. The resulting NP-hard sorting problem is then solved by means of a polynomial size integer linear program. The third biologically motivated constraint that has been taken into account is that certain subsets of genes are often found in close proximity in the gene arrangements of many different species. This observation is reflected by demanding rearrangement scenarios to preserve certain groups of genes which are modeled by common intervals of permutations. In order to solve the sorting problem that considers all three types of biologically motivated constraints, the exact dynamic programming algorithm CREx2 is proposed. CREx2 has a linear runtime for a large class of problem instances. Otherwise, two versions of the CREx2 are provided: The first version provides exact solutions but has an exponential runtime in the worst case and the second version provides approximated solutions efficiently. CREx2 is evaluated by an empirical study for simulated artificial and real biological mitochondrial gene arrangements

    Economic Genome Assembly from Low Coverage Illumina and Nanopore Data

    Get PDF
    Ongoing developments in genome sequencing have caused a fundamental paradigm shift in the field in recent years. With ever lower sequencing costs, projects are no longer limited by available raw data, but rather by computational demands. The high complexity of eukaryotic genomes in concordance with increasing data sizes creates unique demands on methods to assemble full genomes. We describe a new approach to assemble genomes from a combination of low-coverage short and long reads. LazyB starts from a bipartite overlap graph between long reads and restrictively filtered short-read unitigs, which are then reduced to a long-read overlap graph G. Instead of the more conventional approach of removing tips, bubbles, and other local features, LazyB stepwisely extracts subgraphs whose global properties approach a disjoint union of paths. First, a consistently oriented subgraph is extracted, which in a second step is reduced to a directed acyclic graph. In the next step, properties of proper interval graphs are used to extract contigs as maximum weight paths. These are translated into genomic sequences only in the final step. A prototype implementation of LazyB, entirely written in python, not only yields significantly more accurate assemblies of the yeast and fruit fly genomes compared to state-of-the-art pipelines but also requires much less computational effort. Our findings demonstrate a new low-cost method that enables the assembly of even large genomes with low computational effort

    Posttranslational protein transport in yeast reconstituted with a purified complex of Sec proteins and Kar2p

    Get PDF
    AbstractWe have reproduced the posttranslational mode of protein translocation across the endoplasmic reticulum membrane with reconstituted proteoliposomes containing a purified complex of seven yeast proteins. This Sec complex includes a heterotrimeric Sec61p complex, homologous to that in mammals, as well as all other membrane proteins found in genetic screens for translocation components. Efficient posttranslational translocation also requires the addition of lumenal Kar2p(BIP) and ATP. The trimeric Sec61p complex also exists as a separate entity that, in contrast with the large Sec complex, is associated with membrane-bound ribosomes. We therefore hypothesize that distinct membrane protein complexes function in co- and posttranslational translocation pathways

    SPITZER: Accretion in Low Mass Stars and Brown Dwarfs in the Lambda Orionis Cluster

    Get PDF
    We present multi-wavelength optical and infrared photometry of 170 previously known low mass stars and brown dwarfs of the 5 Myr Collinder 69 cluster (Lambda Orionis). The new photometry supports cluster membership for most of them, with less than 15% of the previous candidates identified as probable non-members. The near infrared photometry allows us to identify stars with IR excesses, and we find that the Class II population is very large, around 25% for stars (in the spectral range M0 - M6.5) and 40% for brown dwarfs, down to 0.04 Msun, despite the fact that the H(alpha) equivalent width is low for a significant fraction of them. In addition, there are a number of substellar objects, classified as Class III, that have optically thin disks. The Class II members are distributed in an inhomogeneous way, lying preferentially in a filament running toward the south-east. The IR excesses for the Collinder 69 members range from pure Class II (flat or nearly flat spectra longward of 1 micron), to transition disks with no near-IR excess but excesses beginning within the IRAC wavelength range, to two stars with excess only detected at 24 micron. Collinder 69 thus appears to be at an age where it provides a natural laboratory for the study of primordial disks and their dissipation.Comment: ApJ, in pres

    Relative Timing Information and Orthology in Evolutionary Scenarios

    Full text link
    Evolutionary scenarios describing the evolution of a family of genes within a collection of species comprise the mapping of the vertices of a gene tree TT to vertices and edges of a species tree SS. The relative timing of the last common ancestors of two extant genes (leaves of TT) and the last common ancestors of the two species (leaves of SS) in which they reside is indicative of horizontal gene transfers (HGT) and ancient duplications. Orthologous gene pairs, on the other hand, require that their last common ancestors coincides with a corresponding speciation event. The relative timing information of gene and species divergences is captured by three colored graphs that have the extant genes as vertices and the species in which the genes are found as vertex colors: the equal-divergence-time (EDT) graph, the later-divergence-time (LDT) graph and the prior-divergence-time (PDT) graph, which together form an edge partition of the complete graph. Here we give a complete characterization in terms of informative and forbidden triples that can be read off the three graphs and provide a polynomial time algorithm for constructing an evolutionary scenario that explains the graphs, provided such a scenario exists. We show that every EDT graph is perfect. While the information about LDT and PDT graphs is necessary to recognize EDT graphs in polynomial-time for general scenarios, this extra information can be dropped in the HGT-free case. However, recognition of EDT graphs without knowledge of putative LDT and PDT graphs is NP-complete for general scenarios. In contrast, PDT graphs can be recognized in polynomial-time. We finally connect the EDT graph to the alternative definitions of orthology that have been proposed for scenarios with horizontal gene transfer. With one exception, the corresponding graphs are shown to be colored cographs

    ROBUST SCHEDULING IN CONSTRUCTION ENGINEERING

    Get PDF
    In construction engineering, a schedule’s input data, which is usually not exactly known in the planning phase, is considered deterministic when generating the schedule. As a result, construction schedules become unreliable and deadlines are often not met. While the optimization of construction schedules with respect to costs and makespan has been a matter of research in the past decades, the optimization of the robustness of construction schedules has received little attention. In this paper, the effects of uncertainties inherent to the input data of construction schedules are discussed. Possibilities are investigated to improve the reliability of construction schedules by considering alternative processes for certain tasks and by identifying the combination of processes generating the most robust schedule with respect to the makespan of a construction project
    corecore